FOG Dynamic Metric Implementation (Build Faster/Artifact Build/Web Extensions)
Categories
(Toolkit :: Telemetry, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox103 | --- | fixed |
People
(Reporter: chutten, Assigned: chutten)
References
Details
(Whiteboard: [telemetry:fog:m?])
Attachments
(12 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Following the design from bug 1662863, implement, test, and document dynamic metric support for FOG.
(This may end up being a metabug)
This covers:
- Supporting automatic dynamic registration of metrics for use by Web Extensions and Build Faster/Artifact Builds
- Tests
- Documentation to act as the authoritative and living evolution of the design doc
This does not cover:
- Manual dynamic registration at runtime of metrics not described in a metrics.yaml
Assignee | ||
Comment 1•3 years ago
|
||
Picking this up to look into it again. Additional uses of such a mechanism now include
- Dynamic Telemetry: where some yet-to-be-designed system on a server sends configurations to a yet-to-be-designed system in Firefox Desktop which will need to register new metrics at runtime
Scope-wise we'll obviously not be supporting Dynamic Telemetry until it exists, but we should ensure what we build can be adapted without too much difficulty in that sort of direction. I've also decided to skip writing in support for Web Extensions in this first implementation (I'll file a follow-up for it) to keep work from ballooning. (This is because extensions that want to use the builtin Glean to report information haven't manifested in the past year, with most using Glean.js because it's just that easy).
Looks as though a skeleton will look like:
t/c/g/moz.build
- If we don't haveCONFIG['COMPILE_ENVIRONMENT']
we can trigger a build script to generate the necessary runtime metrics files in a runtime-accessible location. (If we do haveCONFIG['COMPILE_ENVIRONMENT']
we should delete same).- Likely does not support Build Faster where we have a compile environment, but we don't want to use it.
t/c/g/build_scripts/
- New script to gather the various and severalmetrics.yaml
andpings.yaml
files and put them in a accessible location (cf.generate_JSON_definitions
int/c/g/build_scripts/gen_{scalar|histogram|event}_data.py
)- Should be testable in
t/c/g/tests/pytest
- Should be testable in
t/c/g/bindings/Glean{Pings}.cpp
- TheNamedGetter
is the first necessary voluntary call into theGlean
orGleanPings
global that presages the use of a metric or ping that may have been added since the last compile. This is the latest we can guarantee that we're still before the first use of an artefact-build-only metric or ping. Here's where we'll need to load those runtime metrics and pings files- This is on the main thread. Which sucks.
- We'll need to do this I/O and processing synchronously. Which also sucks.
- However, we can limit this to local developer builds by checking
!MOZILLA_OFFICIAL
(like we do to determine whether to disable upload)- Optionally we could also check
MOZ_UPDATE_CHANNEL
==
"default"
like Telemetry does, but we should probably stick to the same criteria we use for disable upload.
- Optionally we could also check
t/c/g/bindings/artefact/
(we might want a submodule here for organization) It'll own the storage and have APIs for adding new records and retrieval (notice that we need to enumerate names inCategory
as well as support lookups by fully-qualified name). TheCategory
- With an
extern "C"
API for adding new records, we can write the file I/O and processing in a Rust crate, which is where I'd prefer it.yaml-rust
andserde_yaml
are already in tree, after all (if we decide to use YAML for the runtime metrics and pings files).
- With an
Sprinkle with tests, soak in documentation (developer and user), bake at 314F for 15.926min, and I think we'll be good to go.
Assignee | ||
Comment 2•3 years ago
|
||
Nope, I'm wrong. Because of course I didn't say anything about what lies beneath. To be fair, neither does the design. (Who wrote that, anyway)
That'll get us the strings and let us create the C++ metrics instances with fresh metrics ids, but any API calls against those instances will hit FFI and immediately panic when there's nothing in __glean_metric_maps
by that id. We could noop things at the C++ layer, but then Artefact Builds that add instrumentation can't run tests, which discourages writing tests, which is a Bad Thing.
So we'll need to go deeper, creating Rust metrics instances and placing them somewhere the FFI macros can find them. These instances will be near-identical to the ones that the Rust consumers will be using by name (meaning we'll be duplicating some memory here. Good thing we're limiting it to developer builds (for now, and to individual runtime-declared metrics in the future)) differing only in the differences that the instrumentor made to the instrumentation and that new metric id.
Why else is this Rust stuff important? That's where IPC lives. And we have enough privileged JS in non-main processes that you bet your buns we'll need it.
In summary, we'll need APIs on the fog
crate for registering new at-runtime metrics. The file processing crate will call that API to fill the Rust maps (MetricId -> Metric Instance, per metric type (cf __glean_metric_maps
) around the same time it's calling the C API to fill the C++ maps (Fully Qualified Metric Name -> metric_entry_t
(packed id + type), Category string list, Metrics string list.).
Labeled Metrics
As always, these are going to be a pain. Not least because MIN_LABELED_SUBMETRIC_ID
is taking the "2^27 bit is on" encoding scheme from the design doc. I think we'll give it the 2^26th bit so we can have 2^25 static metrics sharing amongst them another 2^25 submetrics, and 2^25 runtime-defined metrics sharing amongst them a different 2^25 submetrics. 2^25 is still 33.5M, so we should have plenty of space (for now).
Now I think that might cover it. No broad changes to the Design, just some augmentation. But we'll see what happens with the code starts hitting the compiler.
Assignee | ||
Comment 3•3 years ago
|
||
Assignee | ||
Comment 4•3 years ago
|
||
Storing what amounts to being a pointer and a length is kinda weird considering
we only ever operate on the string data it means.
nsDependentCString is a teensy bit larger (stores the string as a pointer
(size_t width) instead of as a uint32_t index, stores data flags in a
uint16_t, and some class flags in another uint16_t. So, altogether, we're
going from 4-byte Category instances to (on systems with 64-bit pointers)
10-byte Category instances.) but it:
- Gives us better methods (makes it harder for me to get my pointer
arithmetic wrong) - Is more flexible, which we'll need when we start using strings
that aren't in the global category string table
Depends on D143044
Assignee | ||
Comment 5•3 years ago
|
||
Depends on D143045
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 6•3 years ago
|
||
Not a lot works in non-main processes, but having an ability to, from xpcshell,
register metrics at runtime would simplify testing more than a smidge.
Depends on D143046
Assignee | ||
Comment 7•3 years ago
|
||
We're gonna try our best to keep most of the JOG stuff in bindings/jog.
We won't always be able to. For example:
In order for the created metrics instances to be available to e.g. FFI,
we'll have to codegen them into the fog
crate.
Assignee | ||
Comment 8•3 years ago
|
||
Pieces include
- A script for collecting all the ids for metric types
- A template for generating the factory that can build each of those types
- mozbuild integration
Assignee | ||
Comment 9•3 years ago
|
||
We go from one place where metrics can be (the static maps) to two
(we throw in the factory's dynamic maps (warehouses?))
(( Actually from two to three when you consider the submetric maps ))
Thanks to how we architected things, the necessary macro changes are nice and
small.
Assignee | ||
Comment 10•3 years ago
|
||
I feel as though there's a refactor where both IPC and FFI can reuse the same
macros, but for now this is a fairly copy-pasta-heavy change.
Depends on D143049
Assignee | ||
Comment 11•3 years ago
|
||
This provides the necessary C++ glue between the Rust factory (and its FFI) and
the JS layer that'll be using this, and also the necessary storage.
Assignee | ||
Comment 12•3 years ago
|
||
Adds a test-only method to JS that permits the runtime registration of metrics.
Also uses that to cover JOG with tests: registering and smoke-testing metrics
of each metric type.
(Events being a notable (temporary) exception)
I don't like writing parsers, but here I am: writing parsers.
This is not designed to be ergonomic, just complete.
No one should be using this except to test the internals, so that's okay:
we're only hurting ourselves : |
Assignee | ||
Comment 13•3 years ago
|
||
Depends on D143051
Assignee | ||
Comment 14•3 years ago
|
||
These patches aren't a complete solution, so there'll be some follow-ups to file. Specific notable omissions presently include:
event
metrics (due to the list of allowed extra keys needing to be static in the SDK at present)- Pings
- Artefact Build Support
- Collecting, collating, writing existing metrics definitions to a single file in the objdir when there's no
COMPILE_ENVIRONMENT
- (Removing ^ when there is a
COMPILE_ENVIRONMENT
to ensure we don't register at runtime things we codegen'd at build time) - Reading ^ synchronously on the main thread when the first metric is looked retrieved in JS
- Registering the metrics so read.
- Collecting, collating, writing existing metrics definitions to a single file in the objdir when there's no
- Tests and docs for all (especially
#define
s in use)
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
|
||
Depends on D144967
Comment 16•3 years ago
|
||
Comment 17•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/70a4a8e1c94e
https://hg.mozilla.org/mozilla-central/rev/bcb8d3298895
https://hg.mozilla.org/mozilla-central/rev/b890543b9677
https://hg.mozilla.org/mozilla-central/rev/13f5ec04f25b
https://hg.mozilla.org/mozilla-central/rev/1b95ed4b63a9
https://hg.mozilla.org/mozilla-central/rev/a21f4220cf2a
https://hg.mozilla.org/mozilla-central/rev/202051f1fee2
https://hg.mozilla.org/mozilla-central/rev/5979019fc8b3
https://hg.mozilla.org/mozilla-central/rev/b4c517c88073
https://hg.mozilla.org/mozilla-central/rev/752f332862e3
https://hg.mozilla.org/mozilla-central/rev/6d33f8bea52a
https://hg.mozilla.org/mozilla-central/rev/8c7c5b07fdff
Comment 18•3 years ago
|
||
I'm seeing "TODO: implement" being printed when doing an artifact build on macOS. Is it possible that's coming from https://searchfox.org/mozilla-central/rev/5644fae86d5122519a0e34ee03117c88c6ed9b47/toolkit/components/glean/build_scripts/glean_parser_ext/jog.py#99 , and is there already a bug on file to track, uh, implementing that? And/or shutting up the warning? :)
Comment 19•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #18)
I'm seeing "TODO: implement" being printed when doing an artifact build on macOS. Is it possible that's coming from https://searchfox.org/mozilla-central/rev/5644fae86d5122519a0e34ee03117c88c6ed9b47/toolkit/components/glean/build_scripts/glean_parser_ext/jog.py#99 , and is there already a bug on file to track, uh, implementing that? And/or shutting up the warning? :)
Probably bug 1767055. chutten's back next week.
Is it only printed once or multiple times? If really too annoying we can get rid of that print.
Comment 20•3 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #19)
(In reply to :Gijs (he/him) from comment #18)
I'm seeing "TODO: implement" being printed when doing an artifact build on macOS. Is it possible that's coming from https://searchfox.org/mozilla-central/rev/5644fae86d5122519a0e34ee03117c88c6ed9b47/toolkit/components/glean/build_scripts/glean_parser_ext/jog.py#99 , and is there already a bug on file to track, uh, implementing that? And/or shutting up the warning? :)
Probably bug 1767055. chutten's back next week.
Is it only printed once or multiple times?
AFAICT only once, and only when doing the complete build (./mach build
), not for subsequent ./mach build faster
.
If really too annoying we can get rid of that print.
It's not so much annoying as a bit mystifying. It would be nice to have a more meaningful message, especially if there are consequences to the resulting build that are developer-relevant (e.g. are glean metrics going to be broken?).
Comment 21•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #20)
(In reply to Jan-Erik Rediger [:janerik] from comment #19)
If really too annoying we can get rid of that print.
It's not so much annoying as a bit mystifying. It would be nice to have a more meaningful message, especially if there are consequences to the resulting build that are developer-relevant (e.g. are glean metrics going to be broken?).
Yeah, I think that was an oversight when this landed just before chutten went out. He's back next week, so probably we can wait until then.
Either by then we make that message nicer, remove it or we'll land the full work really
Assignee | ||
Comment 22•3 years ago
|
||
Whoopsie. Sorry about that. It shouldn't be printed at all and I missed it in the logspam. bug 1767055 is indeed where that'll be implemented, but it's a big'un so lemme file a small'n'quick bug to remove this print in the meantime.
Description
•